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1 Introduction 

Research on integrated neural-symbolic systems has made significant progress 
in the recent past. In particular the understanding of ways to deal with 
symbolic knowledge within connectionist systems (also called artificial neu- 
ral networks) has reached a critical mass which enables the community to 
strive for applicable implementations and use cases. Recent work has cov- 
ered a great variety of logics used in artificial intelligence and provides a 
multitude of techniques for dealing with them within the context of artificial 
neural networks. 

Already in the pioneering days of computational models of neural cogni- 
tion, the question was raised how symbolic knowledge can be represented 
and dealt with within neural networks. The landmark paper [McCulloch and Pitts, 1943] 
provides fundamental insights how propositional logic can be processed us- 
ing simple artificial neural networks. Within the following decades, however, 
the topic did not receive much attention as research in artificial intelligence 
initially focused on purely symbolic approaches. The power of machine 
learning using artificial neural networking was not recognized until the 80s, 
when in particular the backpropagation algorithm [Rumelhart et al 7 1986] 
made connectionist learning feasible and applicable in practice. 

These advances indicated a breakthrough in machine learning which 
quickly led to industrial-strength applications in areas such as image analy- 
sis, speech and pattern recognition, investment analysis, engine monitoring, 
fault diagnosis, etc. During a training process from raw data, artificial neu- 
ral networks acquire expert knowledge about the problem domain, and the 
ability to generalize this knowledge to similar but previously unencountered 
situations in a way which often surpasses the abilities of human experts. The 
knowledge obtained during the training process, however, is hidden within 
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Figure 1. Neural-symbolic learning cycle 



the acquired network architecture and connection weights, and not directly 
accessible for analysis, reuse, or improvement, thus limiting the range of 
applicability of the neural networks technology. For these purposes, the 
knowledge would be required to be available in structured symbolic form, 
most preferably expressed using some logical framework. 

Likewise, in situations where partial knowledge about an application do- 
main is available before the training, it would be desirable to have the means 
to guide connectionist learning algorithms using this knowledge. This is the 
case in particular for learning tasks which traditionally fall into the realm of 
symbolic artificial intelligence, and which are characterized by complex and 
often recursive interdependencies between symbolically represented pieces 
of knowledge. 

The arguments just given indicate that an integration of connectionist 
and symbolic approaches in artificial intelligence provides the means to ad- 
dress machine learning bottlenecks encountered when the paradigms are 
used in isolation. Research relating the paradigms came into focus when 
the limitations of purely connectionist approaches became apparent. The 
corresponding research turned out to be very challenging and produced a 
multitude of very diverse approaches to the problem. Integrated systems in 
the sense of this survey are those where symbolic processing functionalities 
emerge from neural structures and processes. 

Most of the work in integrated neural-symbolic systems addresses the 
neural-symbolic learning cycle depicted in Figure 1. A front-end (symbolic 
system) is used to feed symbolic (partial) expert knowledge to a neural or 
connectionist system which can be trained on raw data, possibly taking 
the internally represented symbolic knowledge into account. Knowledge 
acquired through the learning process can then be extracted back to the 
symbolic system (which now also acts as a back-end), and made available 
for further processing in symbolic form. Studies often address only parts of 
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the neural-symbolic learning cycle (like the representation or extraction of 
knowledge), but can be considered to be part of the overall investigations 
concerning the cycle. 

We assume that the reader has a basic familiarity with artificial neural 
networks and symbolic artificial intelligence, as conveyed by any introduc- 
tory courses or textbooks on the topic, e.g. in [Russell and Norvig, 2003]. 
However, we will refrain from going into technical detail at any point, but 
rather provide ample references which can be followed up at ease. The se- 
lection of research results which we will discuss in the process is naturally 
subjective and driven by our own specific research interests. Nevertheless, 
we hope that this survey also provides a helpful and comprehensive albeit 
unusual literature overview to neural-symbolic integration. 

This chapter is structured as follows. In Section 2, we introduce some of 
those integrated neural-symbolic systems, which we consider to be founda- 
tional for the majority of the work undertaken within the last decade. In 
Section 3, we will explain our proposal for a classification scheme. In Section 
4, we will survey recent literature by means of our classification. Finally, in 
Section 5, we will give an outlook on possible further developments. 

2 Neural-Symbolic Systems 

As a reference for later sections, we will review some well-known systems 
here. We will start with the landmark results by McCulloch and Pitts, which 
relate finite automata and neural networks [McCulloch and Pitts, 1943]. 
Then we will discuss a method for representing structured terms in a con- 
nectionist systems, namely the recursive autoassociative memories (RAAM) 
[Pollack, 1990]. The SHRUTI System, proposed in [Shastri and Ajjanagadde, 1993], 
is discussed next. Finally, Connectionist Model Generation using the Core 
Method is introduced as proposed in [Hdlldobler and Kalinke, 1994]. These 
approaches lay the foundations for most of the more recent work on neural- 
symbolic integration which we will discuss in this chapter. 

2.1 Neural Networks and Finite Automata 

The advent of automata theory and of artificial neural networks, marked 
also the advent of neural-symbolic integration. In their seminal paper 
[McCulloch and Pitts, 1943] Warren Sturgis McCulloch and Walter Pitts 
showed that there is a strong relation between symbolic systems and arti- 
ficial neural networks. In particular, they showed that for each finite state 
machine there is a network constructed from binary threshold units - and 
vice versa - such that the input-output behaviour of both systems coincide. 
This is due to the fact that simple logical connectives such as conjunction, 
disjunction and negation can easily be encoded using binary threshold units, 
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with weights and thresholds set appropriately. To illustrate the ideas, we 
will discuss a simple example in the sequel. 

EXAMPLE 1. Figure 2 on the left shows a simple Moore-machine, which is 
a finite state machine with outputs attached to the states [Hopcroft and Ullman, 1989]. 
The corresponding network is shown on the right. The network consists of 
four layers. For each output-symbol (0, 1) there is a unit in the output-layer, 
and for each input-symbol (a, b) a unit in the right part of the input-layer. 
Furthermore, for each state (go, Qi) of the automaton, there is a unit in the 
state-layer and in the left part of the input layer. In our example, there are 
two ways to reach the state gi, namely by being in state qi and receiving 
an 'a', or by being in state go and receiving a '6'. This is implemented by 
using a disjunctive neuron in the state-layer receiving inputs from two con- 
junctive units in the gate layer, which are connected to the corresponding 
conditions, as e.g. being in state go and reading a 

A network of n binary threshold units can be in 2™ different states 
only, and the change of state depends on the current input to the net- 
work only. These states and transitions can easily be encoded as a finite 
automaton, using a straightforward translation [McCulloch and Pitts, 1943, 
Kleene, 1956]. An extension to the class of weighted automata is given in 
[Bader et at, 2004a]. 

2.2 Connectionist Term Representation 

The representation of possibly infinite structures in a finite network is one of 

the major obstacles on the way to neural-symbolic integration [Bader et at, 2004b]. 
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Table 1. Extracted training samples from the tree shown in Figure 3. 
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Figure 3. Example tree and a RAAM for binary trees. 



One attempt to solve this will be discussed in this section, namely the idea of 
recursive autoassociative memories (RAAMs) as introduced in [Pollack, 1990], 
where a fixed length representation of variable sized data is obtained by 
training an artificial neural network using backpropagation. Again, we will 
try to illustrate the ideas by discussing a simple example. 

EXAMPLE 2. Figure 3 shows a small binary tree which shall be encoded 
in a fixed-length real vector. The resulting RAAM-network is depicted in 
Figure 3, where each box depicts a layer of 4 units. The network is trained as 
an encoder-decoder network, i.e. it reproduces the input activations in the 
output layer [Bishop, 1995]. In order to do this, it must create a compressed 
representation in the hidden layer. Table 1 shows the activations of the 
layers during the training of the network. As the training converges we 
shall have A = A', B = B' , etc. To encode the terminal symbols A, B, 
C and D we use the vectors (1,0,0,0), (0,1,0,0), (0,0,1,0) and (0,0,0,1) 
respectively. The representations of R\, R 2 and Rz are obtained during 
training. After training the network, it is sufficient to keep the internal 
representation i? 3 , since it contains all necessary information for recreating 
the full tree. This is done by plugging it into the hidden layer and recursively 
using the output activations, until binary vectors, hence terminal symbols, 
are reached. 

While recreating the tree from its compressed representation, it is neces- 
sary to distinguish terminal and non-terminal vectors, i.e. those which rep- 
resent leafs of the trees from those representing nodes. Due to noise or inac- 
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Rules Facts 

Owns(y, z) <— Gives{x, y, z) Gives(j ohn, j osephine, book) 

Owns(x,y) <— Buys(x,y) Buys(carl,x) 

Can—sell(x, y) <— Owns(x, y) Owns(josephine, ball) 

Table 2. A knowledge base for Own and Can— sell 

curacy, it can be very hard to recognise the "1-of-n" -vectors representing ter- 
minal symbols. In order to circumvent this problem different solutions were 
proposed, which can be found in [Stolcke and Wu, 1992, Sperduti, 1994a, 
Sperduti, 1994b]. The ideas described above for binary vectors apply also 
for trees with larger, but fixed, branching factors, by simply using bigger 
input and output layers. In order to store sequences of data, a version 
called S-RAAM (for sequential RAAM) can be used [Pollack, 1990]. In 
[Blair, 1997] modifications were proposed to allow the storage of deeper and 
more complex data structures than before, but their applicability remains 
to be shown [Kalinkc, 1997]. Other recent approaches for enhancement 
have been studied e.g. in [Sperduti et al, 1995, Kwasny and Kalman, 1995, 
Sperduti et al., 1997, Hammerton, 1998, Adamson and Damper, 1999], which 
also include some applications. A recent survey which includes RAAM ar- 
chitectures and addresses structured processing can be found in [Frasconi et al, 2001]. 
The related approach on Holographic reduced representations (HRRs) [Plate, 1991, 
Plate, 1995] also uses fixed-length representations of variable-sized data, but 
using different methods. 

2.3 Reflexive Connectionist Reasoning 

A wide variety of tasks can be solved by humans very fast and efficiently. 
This type of reasoning is sometimes referred to as reflexive reasoning. The 
SHRUTI system [Shastri and Ajjanagadde, 1993] provides a connectionist 
architecture performing this type of reasoning. Relational knowledge is 
encoded by clusters of cells and inferences by means of rhythmic activity 
over the cell clusters. It allows to encode a (function-free) fragment of 
first-order predicate logic analyzed in [Holldobler et al, 1999b]. Binding of 
variables - a particularly difficult aspect of neural-symbolic integration - is 
obtained by time-synchronization of activities of neurons. 

EXAMPLE 3. Table 2 shows a knowledge base describing what it means to 
own something and to be able to sell it. Furthermore it states some facts. 
The resulting SHRUTI network is shown in Figure 4. 

Recent enhancements, as reported in [Shastri, 1999] and [Shastri and Wendelken, 1999], 
allow e.g. the support of negation and inconsistency. [Wcndclkcn and Shastri, 2003] 
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Figure 4. A SHRUTI network for the knowledge base from Table 2. Each 
predicate is represented by two relais units (A, y) and a set of argument 
units (O)- Constants are represented as _um ts in the upper right. Facts 
are implemented using >-units. 



adds very basic learning capabilities to the system, while [Wendelken and Shastri, 2004] 
addresses the problem of multiple reuse of knowledge rules, an aspect which 
limits the capabilities of SHRUTI. 

2.4 Connectionist Model Generation using the Core Method 

In 1994, Holldobler and Kalinke proposed a method to translate a propo- 
sitional logic program into a neural network [Holldobler and Kalinke, 1994] 
(a revised treatment is contained in [Hitzler et at, 2004]), such that the 
network will settle down in a state corresponding to a model of the pro- 
gram. To achieve this goal, not the program itself, but rather the associated 
consequence operator was implemented using a connectionist system. The 
realization is close in spirit to [McCulloch and Pitts, 1943], and Figure 5 
shows a propositional logic program and the corresponding network. 

EXAMPLE 4. The simple logic program in Figure 5 states that a is a fact, 
b follows from a, etc. This "follows-from" is usually captured by the asso- 
ciated consequence operator Tp [Lloyd, 1988]. The figure shows also the cor- 
responding network, obtained by the algorithm given in [Holldobler and Kalinke, 1994]. 
For each atom (a, b, c, d, e) there is a unit in the input- and output layer, 
whose activation represents the truth value of the corresponding atom. Fur- 
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thermore, for each rule in the program there is a unit in the hidden layer, 
acting as a conjunction. If all requirements are met, this unit becomes active 
and propagates its activation to the consequence-unit in the output layer. 

It can be shown that every logic program can be implemented using a 
3-layer network of binary threshold units, and that 2-layer networks do 
not suffice. It was also shown that under some syntactic restrictions on 
the programs, their semantics could be recovered by recurrently connect- 
ing the output- and the input layer of the network (as indicated in Figure 
5) and propagating activation exhaustively through the resulting recurrent 
network. Key idea to [Holldobler and Kalinke, 1994] was to represent logic 
programs by means of their associated semantic operators, i.e. by connec- 
tionist encoding of an operator which captures the meaning of the program, 
instead of encoding the program directly. More precisely, the functional 
input-output behaviour of a semantic operator Tp associated with a given 
program P is encoded by means of a feedforward neural network Np which, 
when presented an encoding of some / to its input nodes, produces Tp(I) 
at its output nodes. Output nodes can also be connected recurrently back 
to the input nodes, resulting in a connectionist computation of iterates of I 
under Tp, as used e.g. in the computation of the semantics or meaning of 
P [Lloyd, 1988]. /, in this case, is a (Herbrand-)interpretation for P, and 
Tp is a mapping on the set Ip of all (Herbrand-)interpretations for P. 

This idea for the representation of logic programs spawned several in- 
vestigations in different directions. As [Holldobler and Kalinke, 1994] cm- 
ployed binary threshold units as activation functions of the network nodes, 
the results were lifted to sigmoidal and hence differentiable activation func- 
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tions in [Garccz et al, 1997, Garcez and Zaverucha, 1999]. This way, the 
connectionist representation of logic programs resulted in a network ar- 
chitecture which could be trained using standard backpropagation algo- 
rithms. The resulting connectionist inductive learning and reasoning sys- 
tem CILP was completed by providing corresponding knowledge extraction 
algorithms [Garcez et al, 2001]. Further extensions to this include modal 
[Garcez et al, 2002b] and intuitionistic logics [Garcez et al, 2003]. Met- 
alevel priories between rules were introduced in [Garcez et al, 2000]. An in- 
depth treatment of the whole approach can be found in [Garcez et al, 2002a]. 
The knowledge based artificial neural networks (KB ANN) [Towell and Shavlik, 1994] 
are closely related to this approach, by using similar techniques to imple- 
ment propositional logic formulae within neural networks, but with a focus 
on learning. 

Another work following up on [Holldobler and Kalinke, 1994] concerns 
the connectionist treatment of first-order logic programming. [Seda, 2005] 
and [Seda and Lane, 2005] approach this by approximating given first-order 
programs P by finite subprograms of the grounding of P. These subpro- 
grams can be viewed as propositional ones and encoded using the original al- 
gorithm from [Holldobler and Kalinke, 1994]. [Seda, 2005] and [Seda and Lane, 2005] 
show that arbitrarily accurate encodings are possible for certain programs 
including definite ones (i.e. programs not containing negation as failure). 
They also lift their results to logic programming under certain multi- valued 
logics. 

A more direct approach to the representation of first-order logic programs 
based on [Holldobler and Kalinke, 1994] was pursued in [Holldobler et al, 1999a, 
Hitzler and Seda, 2000, Hitzlcr et al, 2004, Hitzler, 2004, Bader et al, 2005a, 
Bader et al, 2005b]. The basic idea again is to represent semantic operators 
Tp : Ip — ► Ip instead of the program P directly. In [Holldobler and Kalinke, 1994] 
this was achieved by assigning propositional variables to nodes, whose ac- 
tivations indicate whether the nodes are true or false within the currently 
represented interpretation. In the propositional setting this is possible be- 
cause for any given program only a finite number of truth values of propo- 
sitional variables plays a role - and hence the finite network can encode 
finitely many propositional variables in the way indicated. For first-order 
programs, infinite interpretations have to be taken into account, thus an en- 
coding of ground atoms by one neuron each is impossible as it would result 
in an infinite network, which is not computationally feasible to work with. 

The solution put forward in [Holldobler et al, 1999a] is to employ the 
capability of standard feedforward networks to propagate real numbers. 
The problem is thus reduced to encoding Ip as a set of real numbers in 
a computationally feasible way, and to provide means to actually construct 
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the networks starting from their input-output behaviour. Since sigmoidal 
units can be used, the resulting networks are trainable by backpropaga- 
tion. [Holldobler et al, 1999a] spelled out these ideas in a limited setting 
for a small class of programs, and was lifted in [Hitzler and Seda, 2000, 
Hitzler et at, 2004] to a more general setting, including the treatment of 
multi-valued logics. [Hitzler, 2004] related the results to logic programming 
under non-monotonic semantics. In these reports, it was shown that ap- 
proximation of logic programs by means of standard feedforward networks 
is possible up to any desired degree of accuracy, and for fairly general classes 
of programs. However, no algorithms for practical generation of approxi- 
mating networks from given programs could be presented. This was finally 
done in [Bader et at, 2005b], and implementations of the approach are cur- 
rently under way, and shall yield a first-order integrated neural-symbolic 
system with similar capabilities as the propositional system CILP. 

There exist two alternative approaches to the representation of first-order 
logic programs via their semantic operators, which have not been studied in 
more detail yet. The first approach, reported in [Bader and Hitzler, 2004], 
uses insights from fractal geometry as in [Barnsley, 1993] to construct it- 
erated function systems whose attractors correspond to fixed points of the 
semantic operators. The second approach builds on Gabbay's Fibring logics 
[Gabbay, 1999], and the corresponding Fibring Neural Networks [Garcez and Gabbay, 2004]. 
The resulting system, presented in [Bader et at, 2005a], employs the fibring 
idea to control the firing of nodes such that it corresponds to term matching 
within a logic programming system. It is shown that certain limited kinds 
of first-order logic programs can be encoded this way, such that their models 
can be computed using the network. 

3 A New Classification Scheme 

In this section we will introduce a classification scheme for neural-symbolic 
systems. This way, we intend to bring some order to the heterogeneous field 
of research, whose individual approaches are often largely incomparable. We 
suggest to use a scheme consisting of three main axes as depicted in Figure 6, 
namely Interrelation, Language and Usage. 

For the interrelation-axis, depicted in Figure 7, we roughly follow the 
scheme introduced and discussed in [Hilario, 1995, Hatzilygeroudis and Prentzas, 2004], 
but adapted to the particular focus which we will put forward. In particular, 
the classifications presented in [Hilario, 1995, Hatzilygeroudis and Prentzas, 2004] 
strive to depict each system at exactly one point in a taxonomic tree. From 
our perspective, certain properties or design decisions of systems are rather 
independent, and should be understood as different dimensions. From 
this perspective approaches can first be divided into two main classes, 
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Figure 6. Main Axes 




Interrelation 
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integrated ■* ►• hybrid 



Figure 7. Interrelation 



namely into integrated (called unified or translational in [Hilario, 1995, 
Hatzilygeroudis and Prentzas, 2004]) and hybrid systems. Integrated are 
those, where full symbolic processing functionalities emerge from neural 
structures and processes - further details will be discussed in Section 4.1. 
Integrated systems can be further subdivided into neuronal and connection- 
ist approaches, as discussed in Section 4.1. Neuronal indicates the usage of 
neurons which are very closely related to biological neurons. In connec- 
tionist approaches there is no claim to neurobiological plausibility, instead 
general artificial neural network architectures are used. Depending on their 
architecture, they can be split into standard and non-standard networks. 
Furthermore, we can distinguish local and distributed representation of the 
knowledge which will also be discussed in more detail in Section 4.1. 

Note that the subdivisions belonging to the interrelation axis are again 
independent of each other. They should be understood as independent sub- 
dimensions, and could also be depicted this way by using further coordinate 
axes. We hope that our simplified visualisation makes it easier to maintain 
an overview. But to be pedantic, for our presentation we actually un- 
derstand the neural-connectionist dimension as a subdivision of integrated 
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systems, and the distributcd-local and standard-nonstandard dimensions 
as independent subdivisions of connectionist systems - simply because this 
currently suffices for classification. 



Usage 
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Figure 8. Language 

Figure 8 depicts the second axis in our scheme. Here, the systems are 
divided according to the language used in their symbolic part. We dis- 
tinguish between symbolic and logical languages. Symbolic approaches in- 
clude the relation to automata as in [McCulloch and Pitts, 1943], to gram- 
mars [Elman, 1990, Fletcher, 2001] or to the storage and retrieval of terms 
[Pollack, 1990], whereas the logical approaches require either propositional 
or first order logic systems, as e.g. in [Holldoblcr and Kalinke, 1994] and 
discussed in Section 2.4. The language axis will be discussed in more detail 
in Section 4.2. 
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Figure 9. Usage 



Most systems focus on one or only a few aspects of the neural-symbolic 
learning cycle depicted in Figure 1, i.e. either the representation of symbolic 
knowledge within a connectionist setting, or the training of preinitialized 
networks, or the extraction of symbolic systems from a network. Depending 
on this main focus we can distinguish the systems as shown in Figure 9. 
The issues of extraction vs. representation on the one hand and learning 
vs. reasoning on the other hand, are discussed in Section 4.3. Systems may 
certainly cover several or all of these aspects, i.e. they may span whole 



Neural-symbolic Integration 



13 



subdimensions. 

4 Dimensions of Neural Symbolic Integration 

In this section, we will survey main research results in this area by classifying 
them according to eight dimensions, marked by the arrows in Figures 7-9. 

• Interrelation 

1. Integrated versus hybrid 

2. Neuronal versus connectionist 

3. Local versus distributed 

4. Standard versus nonstandard 

• Language 

5. Symbolic versus logical 

6. Propositional versus first-order 

• Usage 

7. Extraction versus representation 

8. Learning versus reasoning 

As discussed above, we believe that these dimensions mark the main points 
of distinction between different integrated neural-symbolic systems. The 
chapter is structured accordingly, examining each of the dimensions in turn. 

4.1 Interrelation 
Integrated versus Hybrid 

This section serves to further clarify what we understand by neural-symbolic 
integration. Following the rationale laid out in the introduction, we under- 
stand why it is desirable to combine symbolic and connectionist approaches, 
and there are obviously several ways how this can be done. From a bird's 
eye view, we can distinguish two main paradigms, which we call hybrid and 
integrated (or following [Hilario, 1995], unified) systems, and this survey is 
concerned with the latter. 

Hybrid systems are characterized by the fact that they combine two or 
more problem-solving techniques in order to address a problem, which run 
in parallel, as depicted in Figure 10. 

An integrated neural-symbolic system differs from a hybrid one in that 
it consists of one connectionist main component in which symbolic knowl- 
edge is processed, see Figure 10 (right). Integrated systems are sometimes 
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C'an—Sell(josephine, ball) 




Onws(josephine, ball) 



Figure 10. Hybrid (left) versus integrated (right) architecture. 

also referred to as embedded or monolithic hybrid systems, cf. [Sun, 2001]. 
Examples for integrated systems are e.g. those presented in Sections 2.2-2.4. 

For either architecture, one of the central issues is the representation of 
symbolic data in connectionist form [Bader et ai, 2004b]. For the hybrid 
system, these transformations are required for passing information between 
the components. The integrated architecture must implicitly or explicitly 
deal with symbolic data by connectionist means, i.e. must also be capable 
of similar transformations. 

This survey covers integrated systems only, the study of which appears to 
be particularly challenging. For recent selective overview literature see e.g. 
[Browne and Sun, 2001, Garcez et at, 2002a, Bader et at, 2004b]. The first, 
[Browne and Sun, 2001], focuses on reasoning systems. The field of prepo- 
sitional logic is thoroughly covered in [Garcez et ai, 2002a], where the au- 
thors revisit the approach of [Holldobler and Kalinke, 1994] and explain 
their extensions including applications to real world problems, like fault 
diagnosis. In [Bader et ai, 2004b] the emphasis is on the challenge prob- 
lems arising from first-order neural-symbolic integration. 

Neuronal versus Connectionist 

There are two driving forces behind the field of neural-symbolic integration: 
On the one hand it is the striving for an understanding of human cognition, 
and on the other it is the vision of combining connectionist and symbolic ar- 
tificial intelligence technology in order to arrive at more powerful reasoning 
and learning systems for computer science applications. 

In [McCulloch and Pitts, 1943] the motivation for the study was to un- 
derstand human cognition, i.e. to pursue the question how higher cognitive 
- logical - processes can be performed by artificial neural networks. In this 
line of research, the question of biological feasability of a network architec- 
ture is prominent, and inspiration is often taken from biological counter- 
parts. 
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The SHRUTI system [Shastri and Ajjanagadde, 1993] as described in 
Section 2.3, for example, addresses the question how it is possible that 
biological networks perform certain reasoning tasks very quickly. Indeed, 
for some complex recognition tasks which involve reasoning capabilities, 
human responses occur sometimes at reflexive speed, particularly within a 
time span which allows processing through very few neuron layers only. As 
mentioned above, time-synchronization was used for the encoding of vari- 
able binding in SHRUTI. 

The recently developed spiking neurons networks [Maass, 2002] take an 
even more realistic approach to the modelling of temporal aspects of neu- 
ral activity. Neurons, in this context, arc considered to be firing so-called 
spike trains, which consist of patterns of firing impulses over certain time 
intervals. The complex propagation patterns within a network are usu- 
ally analysed by statistical methods. The encoding of symbolic knowledge 
using such temporal aspects has hardly been studied so far, an excep- 
tion being [Sougne, 2001]. We perceive it as an important research chal- 
lenge to relate the neurally plausible spiking neurons approach to neural- 
symbolic integration research. To date, however, only a few preliminary 
results on computational aspects of spiking neurons have been obtained 
[Natschlager and Maass, 2002, Maass and Markram, 2004, Maass et al, 2005]. 

Another recent publication, [van der Velde and de Kamps, 2005], shows 
how natural language could be encoded using biologically plausible mod- 
els of neural networks. The results appear to be suitable for the study 
of neural-symbolic integration, but it remains to be investigated to which 
extent the provided approach can be transfered to symbolic reasoning. Sim- 
ilarly inspiring might be the recent book [Hawkins and Blakeslee, 2004] and 
accompanying work, though it discusses neural-symbolic relationships on a 
very abstract level only. 

The lines of research just reviewed take their major motivation from 
the goal to achieve biologically plausible behaviour or architectures. As 
already mentioned, neural-symbolic integration can also be pursued from 
a more technically motivated perspective, driven by the goal to combine 
the advantages of symbolic and connectionist approaches by studying their 
interrelationships. The work on the Core Method, discussed in Section 2.4, 
can be subsumed under this technologically inspired perspective. 

Local versus Distributed Representation of Knowledge 

For integrated neural-symbolic systems, the question is crucial how sym- 
bolic knowledge is represented within the connectionist system. If standard 
networks are being trained using backpropagation, the knowledge acquired 
during the learning process is spread over the network in diffuse ways, i.e. 
it is in general not easy or even possible to identify one or a small number 
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of nodes whose activations contain and process a certain symbolic piece of 
knowledge. 

The RAAM architecture and their variants as discussed in Section 2.2 are 
clearly based on distributed representations. Technically, this stems from 
the fact that the representation is initially learned, and no explicit algorithm 
for translating symbolic knowledge into the connectionist setting is being 
used. 

Most other approaches to neural-symbolic integration, however, repre- 
sent data locally. SHRUTI (Section 2.3) associates a defined node assembly 
to each logical predicate, and the architecure does not allow for distributed 
representation. The approaches for propositional connectionist model gen- 
eration using the Core Method (Section 2.4) encode propositional variables 
as single nodes in the input resp. output layer, and logical formulae (rules) 
by single nodes in the hidden layer of the network. 

The design of distributed encodings of symbolic data appears to be par- 
ticular challenging. It also appears to be one of the major bottlenecks 
in producing applicable integrated neural-symbolic systems with learning 
and reasoning abilities [Bader et al., 2004b]. This becomes apparent e.g. 
in the difficulties faced by the first-order logic programming approaches 
discussed in Section 2.4. Therein, symbolic entities are not represented 
directly. Instead, interpretations (i.e. valuations) of the logic are being 
represented, which contain truth value assignments to language constructs. 
Concrete representations, as developed in [Bader et al., 2005b], distribute 
the encoding of the interpretations over several nodes, but in a diffuse way. 
The encoding thus results in a distributed representation. Similar consid- 
erations apply to the recent proposal [Gust and Kiihnberger, 2005], where 
first-order logic is first converted into variable-free form (using topoi from 
category theory), and then fed to a neural network for training. 

Standard versus Non/standard Network Architecture 

Even though neural networks are a widely accepted paradigm in Al it is hard 
to make out a standard architecture. But, all so called standard-architecture 
systems agree at least on the following: 

• only real numbers are propagated along the connections 

• units compute very simple functions only 

• all units behave similarly (i.e. they use similar simple functions and 
the activation values are always within a small range) 

• only simple recursive structures are used (e.g. connecting only the 
output back to the input layer, or use selfrecursive units only) 
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When adhering to these standard design principles, powerful learning tech- 
niques as e.g. backpropagation [Rumclhart et al, 1986] or Hebbian Learn- 
ing [Hcbb, 1949] can be used to train the networks, which makes them 
applicable to real world problems. 

However, these standard architectures do not easily lend themselves to 
neural-symbolic integration. In general, it is easier to use non-standard 
architectures in order to represent and work with structured knowledge, 
with the drawback that powerful learning abilities are often lost. 

Neural-symbolic approaches using standard networks are e.g. the CILP 
system [Garcez and Zaverucha, 1999], KB ANN [Towell and Shavlik, 1994], 
RAAM (Section 2.2) and [Seda and Lane, 2005] (Section 2.4). Usually, they 
consist of a layered network, consisting of three or in case of KBANN more 
layers, and sigmoidal units are being used. For these systems experimen- 
tal results are available showing their learning capabilities. As discussed 
above, these systems are able to handle propositional knowledge (or first 
order with a finite domain). Similar observations can be made about the 
standard architectures used in [Holldoblcr et al, 1999a, Hitzler et al, 2004, 
Bader et al, 2005b] for first-order neural-symbolic integration. 

Non-standard networks were used e.g. in the SHRUTI system [Shastri and Ajjanagadde, 1993] , 
and in the approaches described in [Bader and Hitzler, 2004] and [Bader et al, 2005a]. 
In all these implementations non-standard units and non-standard archi- 
tectures were used, and hence none of the usual learning techniques are 
applicable. However, for the SHRUTI system limited learning techniques 
based on Hebbian Learning [Hcbb, 1949] were developed [Shastri, 2002, 
Shastri and Wendelken, 2003, Wendelken and Shastri, 2003]. 

4.2 Language 
Symbolic versus Logical 

One of the motivations for studying neural-symbolic integration is to com- 
bine connectionist learning capabilities with symbolic knowledge processing, 
as already mentioned. While our main interest is in pursuing logical aspects 
of symbolic knowledge, this is not necessarily always the main focus of in- 
vestigations. 

Work on representing automata or weighted automata [Kleene, 1956, 
McCulloch and Pitts, 1943, Bader et al, 2004a] (Section 2.1) using artifi- 
cial neural networks, for example focuses on computationally relevant struc- 
tures, such as automata, and not directly on logically encoded knowledge. 
Nevertheless, such investigations show how to deal with structural knowl- 
edge within a connectionist setting, and can serve as inspiration for corre- 
sponding research on logical knowledge. 

Recursive autoassociative memory, RAAM, and their variants as dis- 
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cussed in Section 2.2, deals with terms only, and not directly with log- 
ical content. RAAM allows connectionist encodings of first-order terms, 
where the underlying idea is to present terms or term trees sequentially to 
a connectionist system which is trained to produce a compressed encoding 
characterized by the activation pattern of a small collection of nodes. To 
date, storage capacity is very limited, and connectionist processing of the 
stored knowledge has not yet been investigated in detail. 

A considerable body of work exists on the connectionist processing and 
learning of structured data using recurrent networks [Sperduti et al, 1995, 
Sperduti et al, 1997, Frasconi et al, 2001, Hammer, 2002, Hammer, 2003, 
Hammer et al., 2004a, Hammer et al., 2004b]. The focus is on tree repre- 
sentations and manipulation of the data. 

[Holldoblcr et al., 1997, Kalinke and Lehmann, 1998] study the represen- 
tation of counters using recurrent networks, and connectionist unification 
algorithms as studied in [Holldoblcr, 1990, Holldobler and Kurfess, 1992, 
Holldoblcr, 1993] are designed for manipulating terms, but already in a 
clearly logical context. The representation of grammars [Giles et al, 1991] 
or more generally of natural language constructs [van der Velde and de Kamps, 2005] 
also has a clearly symbolic (as opposed to logical) focus. 

It remains to be seen, however, to what extent the work on connectionist 
processing of structured data can be reused in logical contexts for creating 
integrated neural-symbolic systems with reasoning capabilities. Integrated 
reasoning systems like the ones presented in Sections 2.3 and 2.4 currently 
lack the capabilities of the term-based systems, so that a merging of these 
efforts appears to be a promising albeit challenging goal. 

Propositional versus First-Order 

Logic-based integrated neural-symbolic systems differ as to the knowledge 
representation language they are able to represent. Concerning the capa- 
bilities of the systems, a major distinction needs to be made between those 
which deal with propositional logics, and those based on first-order predicate 
(and related) logics. 

What we mean by propositional logics in this context includes proposi- 
tional modal, temporal, non-monotonic, and other non-classical logics. One 
of their characteristic feature which distinguishes them from first-order log- 
ics for neural-symbolic integration is the fact that they are of a finitary 
nature: propositional theories in practice involve only a finite number of 
propositional variables, and corresponding models are also finite. Also, so- 
phisticated symbol processing as needed for nested terms in the form of 
substitutions or unification is not required. 

Due to their finiteness it is thus fairly easy to implement propositional 
logic programs using neural networks [Holldobler and Kalinke, 1994] (Sec- 
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tion 2.4). A considerable body of work deals with the extension of this ap- 
proach to non-classical logics [Garcez et al, 2005, Garcez et al, 2000, Garcez et al, 2002b, 
Garcez et al, 2003, Garcez et al, 2004a, Garcez et al, 2004b, Garcez and Lamb, 200x]. 
This includes modal, intuitionistic, and argumentation-theoretic approaches, 
amongst others. Earlier work on representing propositional logics is based 
on Hopfield networks [Pinkas, 1991b, Pinkas, 1991a] but has not been fol- 
lowed up on recently. 

In contrast to this, predicate logics - which for us also include modal, 
non-monotonic, etc. extensions - in general allow to use function sym- 
bols as language primitives. Consequently, it is possible to use terms of 
arbitrary depth, and models necessarily assign truth values to an infinite 
number of ground atoms. The difficulty in dealing with this in a connection- 
ist setting lies in the finiteness of neural networks, necessitating to capture 
the infinitary aspects of predicate logics by finite means. The first-order 
approaches presented in [Holldobler et al, 1999a, Hitzler and Seda, 2000, 
Bader and Hitzler, 2004, Hitzler et al, 2004, Bader et al, 2005a, Badcr et al, 2005b] 
(Section 2.4) solve this problem by using encodings of infinite sets by real 
numbers, and representing them in an approximate manner. They can also 
be carried over to non-monotonic logics [Hitzler, 2004]. 

[Badcr et al, 2005a], which builds on [Garcez and Gabbay, 2004] and [Gabbay, 1999] 
uses an alternative mechanism in which unification of terms is controlled via 
fibrings. More precisely, certain network constructs encode the matching of 
terms and act as gates to the firing of neurons whenever corresponding 
symbolic matching is achieved. 

A prominent subproblem in first-order neural-symbolic integration is that 
of variable binding. It refers to the fact that the same variable may occur 
in several places in a formula, or that during a reasoning process variables 
may be bound to instantiate certain terms. In a connectionist setting, dif- 
ferent parts of formulae and different individuals or terms are usually repre- 
sented independently of each other within the system. The neural network 
paradigm, however, forces subnets to be blind with respect to detailed acti- 
vation patterns in other subnets, and thus does not lend itself easily to the 
processing of variable bindings. 

Research on first-order neural-symbolic integration has led to different 
means of dealing with the variable binding problem. One of them is to use 
temporal synchrony to achieve the binding. This is encoded in the SHRUTI 
system (Section 2.3), where the synchronous firing of variable nodes with 
constant nodes encodes a corresponding binding. Other approaches, as 
discussed in [Browne and Sun, 1999], encode binding by relating the prop- 
agated activations, i.e. real numbers. 

Other systems avoid the variable binding problem by converting predi- 
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cate logical formulae into variable- free representations. The approaches in 
[Holldobler et al, 1999a, Hitzler and Scda, 2000, Hitzler et al, 2004, Hitzler, 2004, 
Seda, 2005, Seda and Lane, 2005, Bader et al, 2005a, Bader et al, 2005b] 
(Section 2.4) make conversions to (infinite) propositional theories, which 
are then approximated. [Gust and Kiihnberger, 2005] use topos theory in- 
stead. 

It shall be noted here that SHRUTI (Section 2.3) addresses the variable 
binding problem, but allows to encode only a very limited fragment of first- 
order predicate logic [Holldobler et al, 1999b]. In particular, it does not 
allow to deal with function symbols, and thus could still be understood as 
a finitary fragment of predicate logic. 

4.3 Usage 

Extraction versus Representation 

The representation of symbolic knowledge is necessary even for classical 
applications of connectionist learning. As an example, consider the neural- 
networks-based Backgammon playing program TD-Gammon [Tesauro, 1995], 
which achieves professional players' strength by temporal difference learn- 
ing on data created by playing against itself. TD-Gammon represents the 
Backgammon board in a straightforward way, by encoding the squares and 
placement of pieces via assemblies of nodes, thus representing the structured 
knowledge of a board situation directly by a certain activation pattern of 
the input nodes. 

In this and other classical application cases the represented symbolic 
knowledge is not of a complex logical nature. Neural-symbolic integration, 
however, attempts to achieve connectionist processing of complex logical 
knowledge, learning, and inferences, and thus the question how to represent 
logical knowledge bases in suitable form becomes dominant. Different forms 
of representation have already been discussed in the context of local versus 
distributed representations. 

Returning to the TD-Gammon example, we would also be interested in 
the complex knowledge as acquired by TD-gammon during the learning pro- 
cess, encoding the strategies with which this program beats human players. 
If such knowledge could be extracted in symbolic form, it could be used 
for further symbolic processing using inference engines or other knowledge 
based systems. 

It is apparent, that both the representation and the extraction of knowl- 
edge are of importance for integrated neural-symbolic systems. They are 
needed for closing the neural-symbolic learning cycle (Figure 1). However, 
they are also of independent interest, and are often studied separately. 

As for the representation of knowledge, this component is present in all 
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systems presented so far. The choice how representation is done often de- 
termines whether standard architectures are used, if a local or distributed 
approach is taken, and whether standard learning algorithms can be em- 
ployed. 

A large body of work exists on extracting knowledge from trained net- 
works, usually focusing on the extraction of rules. [Jacobsson, 2005] gives a 
recent overview over extraction methods. A method from 1992 [Giles et al., 1991] 
is still up to date, where a method is given to extract a grammar rep- 
resented as a finite state machine from a trained recurrent neural net- 
work. [McGarry et al, 1999] show how to extract rules from radial ba- 
sis function networks by identifying minimal and maximal activation val- 
ues. Some of the other efforts are reported in [Towell and Shavlik, 1993, 
Andrews et al, 1995, Bologna, 2000, Garcez et al., 2001, Lehmann et al., 2005] 

It shall be noted that only a few systems have been proposed to date 
which include representation, learning, and extraction capabilities in a mean- 
ingful way, one of them being CILP [Garcez et al., 1997, Garcez and Zaverucha, 1999, 
Garcez et al, 2001]. It is to date a difficult research challenge to provide 
similar functionalities in a first-order setting. 

Learning versus Reasoning 

Ultimately, our goal should be to produce an effective Al system with 
added reasoning and learning capabilities, as recently pointed out by Valiant 
[Valiant, 2003] as a key challenge for computer science. It turns out that 
most current systems have either learning capabilities or reasoning capabil- 
ities, but rarely both. SHRUTI (Section 2.3), for example, is a reasoning 
system with very limited learning support. 

In order to advance the state of the art in the sense of Valiant's vision 
mentioned above, it will be necessary to install systems with combined 
capabilities. In particular, learning should not be independent of reasoning, 
i.e. initial knowledge and logical consequences thereof should help guiding 
the learning process. There is no system to-date which realizes this in any 
way, and new ideas will be needed to attack this problem. 

5 Conclusions and Further Work 

Intelligent systems based on symbolic knowledge processing, on the one 
hand, and on artificial neural networks, on the other, differ substantially. 
Nevertheless, these are both standard approaches to artificial intelligence 
and it would be very desirable to combine the robustness of neural net- 
works with the expressivity of symbolic knowledge representation. This is 
the reason why the importance of the efforts to bridge the gap between 
the connectionist and symbolic paradigms of Artificial Intelligence has been 
widely recognised. As the amount of hybrid data containing symbolic and 
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statistical elements as well as noise increases in diverse areas such as bioin- 
formatics or text and web mining, neural-symbolic learning and reasoning 
becomes of particular practical importance. Notwithstanding, this is not an 
easy task, as illustrated in the survey. 

The merging of theory (background knowledge) and data learning (learn- 
ing from examples) in neural networks has been indicated to provide learning 
systems that are more effective than e.g. purely symbolic and purely connec- 
tionist systems, especially when data are noisy [Garcez and Zaverucha, 1999] 
This has contributed decisively to the growing interest in developing neural- 
symbolic systems, i.e. hybrid systems based on neural networks that are 
capable of learning from examples and background knowledge, and of per- 
forming reasoning tasks in a massively parallel fashion. 

However, while symbolic knowledge representation is highly recursive and 
well understood from a declarative point of view, neural networks encode 
knowledge implicitly in their weights as a result of learning and general- 
isation from raw data, which are usually characterized by simple feature 
vectors. While significant theoretical progress has recently been made on 
knowledge representation and reasoning using neural networks, and on di- 
rect processing of symbolic and structured data using neural methods, the 
integration of neural computation and expressive logics such as first order 
logic is still in its early stages of methodological development. 

Concerning knowledge extraction, we know that neural networks have 
been applied to a variety of real- world problems (e.g. in bioinformatics, 
engineering, robotics), and they were particularly successful when data are 
noisy. But entirely satisfactory methods for extracting symbolic knowledge 
from such trained networks in terms of accuracy, efficiency, rule comprehen- 
sibility, and soundness are still to be found. And problems on the stability 
and learnability of recursive models currently impose further restrictions on 
connectionist systems. 

In order to advance the state of the art, we believe that it is necessary to 
look at the biological inspiration for neural-symbolic integration, to use more 
formal approaches for translating between the connectionist and symbolic 
paradigms, and to pay more attention to potential application scenarios. 

The general motivation for research in the field of neural-symbolic in- 
tegration (just given) arises from conceptual observations on the comple- 
mentary nature of symbolic and neural network based artificial intelligence 
described above. This conceptual perspective is sufficient for justifying the 
mainly foundations-driven lines of research being undertaken in this area 
so far. However, it appears that this conceptual approach to the study of 
neural-symbolic integration has now reached an impasse which requires the 
identification of use cases and application scenarios in order to drive future 
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research. 

Indeed, the theory of integrated neural-symbolic systems has reached 
a quite mature state but has not been tested extensively so far on real 
application data. The current systems have been developed for the study of 
general principles, and are in general not suitable for real data or application 
scenarios that go beyond propositional logic. Nevertheless, these studies 
provide methods which can be exploited for the development of tools for use 
cases, and significant progress can now only be expected as a continuation 
of the fundamental research undertaken in the past. 

In particular, first-order neural-symbolic integration still remains a widely 
open issue, where advances are very difficult, and it is very hard to judge 
to date to what extent the theoretical approaches can work in practice. We 
believe that the development of use cases with varying levels of expressive 
complexity is, as a result, needed to drive the development of methods for 
neural-symbolic integration beyond propositional logic [Hitzler et al., 2005]. 
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